A Quantum Langevin Formulation of Risk-Sensitive Optimal Control 
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In this paper we formulate a risk-sensitive optimal control problem for continuously monitored 
open quantum systems modelled by quantum Langevin equations. The optimal controller is ex- 
pressed in terms of a modified conditional state, which we call a risk-sensitive state, that represents 
measurement knowledge tempered by the control purpose. One of the two components of the opti- 
mal controller is dynamic, a filter that computes the risk-sensitive state. The second component is 
an optimal control feedback function that is found by solving the dynamic programming equation. 
The optimal controller can be implemented using classical electronics. The ideas are illustrated 
using an example of feedback control of a two-level atom. 

PACS numbers: 03.65.Ta,02.30.Yy,42.50.Lc 
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I. INTRODUCTION 

Recent years have seen significant advances in quan- 
tum technology, quantum information and computing, 
continuous quantum measurements, and feedback con- 
trol is 

Et 

[23j. Optimal feedback control is an important method- 
ology from classical control theory that is widely used for 
control system design^and has been a ppli ed to quantum 
systems, for example @, 0, 0,113. The designer 
encodes the desired performance objectives (e.g. regu- 
lation, etc) in a cost function, which is then optimized. 
When the system to be controlled is subject to noise, un- 
certainty, or disturbances, optimal feedback controllers 
are sought. Because of the intrinsic randomness in quan- 
tum measurements, feedback control of quantum systems 
using classical electronics has close connections to classi- 
cal stochastic control theory, ■ 

In the context of classical linear systems subject to 
Gaussian noise, there are two main types of cost func- 
tions. In linear quadratic gaussian (LQG) design, the 
cost is an expected value of an integral or sum of 
quadratic system variables; the cost is additive, |25j . 
In contrast, in linear exponential quadratic gaussian 
(LEQG) control, the cost is multiplicative, bei ng t he av- 
erage of the exponential of an integral or sum, |22l | , |33j . 
LEQG is also known as risk-sensitive optimal control 
(LQG is sometimes referred to as risk-neutral optimal 
control). More general formulations of these problems 
have been developed, e.g. for nonlinear stochastic sys- 
tems. It is known that risk-sensitive controllers enjoy en- 
hanced robustness properties, 0], ^5j, and this provides 
an important motivation for the study of risk-sensitive 
problems. A fundamental difference between the opti- 
mal solutions to the risk-neutral and risk-sensitive prob- 
lems is that the optimal risk-neutral controller is given in 
terms of the optimal state estimator (Kalman filter in the 
LQG case), while the optimal risk-sensitive controller is 
given in terms of a quantity that takes into account the 
cost objective (it is given by a modified Kalman filter in 
the LEQG case), and is in general not the optimal state 
estimator, j3j|, Q- 



Optimal feedback control problems for quantum sys- 
tems using additive cost functions have been considered 
in the literature, 0, 0, E3> @i we refer to these as risk- 
neutral. The optimal risk-neutral controllers obtained in 
these papers are given in terms of a posteriori conditional 
states (evolving according to stochastic master equations, 
or quantum trajectory equation, or Belavkin quantum 
filtering equations). In [23j a class of risk-sensitive opti- 
mal control problems was considered. This class of prob- 
lems was specified in discrete-time using the framework of 
quantum operations and conditional states. The optimal 
risk-sensitive controller is given in terms of a modified un- 
normalized conditional state that takes into account the 
cost function. This risk-sensitive state represents mea- 
surement knowledge available to the controller tempered 
by the purpose of the designer, as in the classical case. 

In this paper we formulate a risk-sensitive optimal con- 
trol problem for continuously monitored open quantum 
systems. We use a Markovian ap pro ximate model to de- 
scribe the open quantum system, |201 Chapters 5 and 11]. 
This model is given by a quantum Langevin equation. 
Quantum stochastic calculus and dynamic programing 
methods are used to study the optimal control problem. 
A heuristic solution is given (as in classical continuous 
time measurement feedback stochastic optimal control, 
there are substantial technical issues). As in [2j|, the 
solution is given in terms of a modified or risk-sensitive 
conditional state, and we present the corresponding mod- 
ified stochastic master equation, a risk-sensitive quantum 
filter. We also consider briefly a risk-neutral problem, 
also formulated using quantum Langevin equations, to 
facilitate connection with the results in the papers 0, 
0, E3i via - quantum filtering. 

We begin in Section[n]by formulating the risk-sensitive 
problem. Then in Section fllll wc show how the cost func- 
tion can be expressed as a stochastic representation in 
terms of the risk-sensitive state mentioned above. The 
dynamic programming solution is discussed in Section 
IIVI and the risk-neutral problem is summarized in Sec- 
tion In Section IVII we illustrate our results in the 
context of feedback control of a two-level atom, 0. 
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II. PROBLEM FORMULATION 

We consider the problem of controlling an open quan- 
tum system model with the following features: 

1 . The evolution can be influenced by control variables 
u that enter the system Hamiltonian H(u). 

2. The system S interacts with two heat baths (elec- 
tromagnetic field channels) B\ and B 2 . 

3. Channel B\ is not monitored; its influence may be 
used, for example, to model dissipative effects. 

4. Channel B 2 is continuously monitored, providing 
weak measurements of the system S, the results y 
of which are available to the controller K, a classical 
system which processes this information to produce 
the control actions u. 

5. The control in general is allowed to be a causal 
function of the measurement trajectory (not just a 
function of the current measurement value). 

6. The controller K is chosen so that it minimizes a 
suitable cost or performance function J(K). 

The controls u take values in a set U, say real or com- 
plex vectors of dimension m (U = R m or U = C m ). In 
general the set U may be bounded or unbounded. Of 
course, we can have multiple measured and unmeasured 
field channels, though we use one of each for notational 
simplicity. 

We now describe the ideal dynamics of the controlled 
system using quantum stochastic differential equations, 
pflj . |3Cj . Let ii = «(■) be a control signal (a function of 
time t with values u(t) G U). Consider the interaction 
picture unitary operators U(t) — U u (t) (often we omit 
explicit dependencies on u from the notation) solving the 
quantum stochastic differential equation (QSDE) |2Ct eq. 
(11.2.7)], |U sec. 26], 

dU(t) = {-K{u{t))dt + LdB{{t)-lSdBi(t) 

+MdBl(t) - M*dB 2 (t)}U(t) (1) 

with initial condition U (0) = /, where 

K{u) = jH{u) + ^ISL + -M ] M. (2) 

Here, L and M are system operators which together with 
the field operators bi(t) = B±(t), b 2 (t) = B 2 (t), model 
the interaction of the system with the channels. Note 
that equation is written in fto form (see, e.g. 0, 
Chapter 4]), as will all stochastic differential equations in 
this paper. With vacuum initialization of the field chan- 
nels, the two non-zero Ito products are, pol eq. (11.2.6)], 

dB^dBlit) = dt and dB 2 {t)dB\{t) = dt. (3) 



Then system operators X evolve according to[39| 

X{t)=j t {u,X) = U\t)XU{t) (4) 

and satisfy the quantum Langevin equation (QLE) 

dX{t) = {-X{t)K{t)- K\t)X{t) (5) 
+L ] {t)X(t)L{t) + M\t)X{t)M(t))dt 
+ [X(t),L{t)]dB\{t) - [X(t),tf{t)]dB x {t) 
+[X(t),M(t)]dBl{t) - [X(t),M\t)]dB 2 (t) 

where L(t) = jt(u,L), M(t) = j t (u,M), and K(t) 
■ (u, K(u(t))) (note the slight abuse of notation regarding 
K{-)). 

In terms of states, if ttq is a given system state, we 
write 

p0 = 7To ® V iV I (g> U2«2 (6) 

(the channels are initially in their vacuum states V\ and 
v 2 respectively), and so the state of the system plus chan- 
nels at time t is given by 

p(t) = U(t) Po U\t), (7) 

so that 

(po,jt(u,X)) = (p(t),X®I®I). (8) 

Here, we have used the notation 

(A,B)=tr[A^B], (9) 

and the symbol / denotes the appropriate identity opera- 
tor. When u(-) is an open loop signal, or simply constant 
(no feedback), we can trace out the field channels and 
obtain the master equation for our setup. Indeed, if p(t) 
denotes the partial trace of p{t) obtained by tracing out 
the field channels, then p(t) solves the master equation 

P{t) = ~K{u{t))p(t) - p{t)K\u{t)) 
+Lp(t)tf + Mp(t)M^ 

= - l -[H{u{t))rp{t)]+V[L]m+V[M]p{t)(lti) 

where T>[c\p = epe^ — ^(e'ep + pe'e) is the decoherence 
operator. The initial condition for l|10|) is p(Q) = ttq. 

We regard the field operators Bk(t), k = 1,2, as in- 
put fields [20l Section 11.3.2], with corresponding output 
fields Ak{t) defined by 

A k (t)=j t (u,B k (t)). (11) 

The real quadratures of the input fields are defined by 

Q k (t) = B k (t) + B{(t), (12) 

and we write 
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a vector of independent quantum noises. For the output 
field real quadratures we write 

Yi(t)=j t (u,Qi(t)), andY 2 (t)=j t (u,Q 2 (t)). (14) 

These processes satisfy the QSDEs 



dYtit) = (L(t) + L\t))dt + dQxit) 
dY 2 (t) = (M{t)+M^(t))dt + dQ 2 (t). 



(15) 



We continuously monitor the second channel, and mea- 
surement of Y 2 (t) produces a real output measurement 
signal y 2 (t), which is used by a (classical) controller K 
to produce the input control signal u(t) by 



i(t) = K(i,2/ 2 ,[o,t]) 



(16) 



The notation used in (|16|l is meant to indicate the 
causal dependence of the control on the measurements; 
2/2. [o.t] indicates the segment of the measurement sig- 
nal on the time interval [0,t], so in effect the controller 
K = {K(t, •)} is a family of functions. To illustrate, con- 
sider the following two special cases: (i) static feedback, 
where u(t) = k(y(t)), so that the control at time t de- 
pends only on the measurement at time t; (ii) dynamic 
feedback, where u is determined by 

u(t) =h K (at),y 2 (t)). [U) 

The complete dynamics of the open system and con- 
troller is obtained by combining the open system evo- 
lution JTJ or the QLE © with the control law 1(15)1. 
Equations and JSJ continue to hold since u depends 
causally on y 2l 

We now specify the cost function that we will use to 
determine the "best" choice of controller. It will be de- 
fined over a fixed time interval [0, T]. Let C\(u) be a 
non-negative self-adjoint system operator depending on 
the control value u, and let C 2 be a non-negative self- 
adjoint system operator. These so-called cost operators 
are chosen to reflect the performance objectives, and ex- 
plicitly include the control so that a balance between per- 
formance and control cost can be achieved (see Section 
I VII for an example). The quantity 



Ci(t)di + C 2 (T), 



(18) 



where d{t) = j t (u, Ci(u(i))), C 2 {t) = j t (u,C 2 ), accu- 
mulates cost over the given time interval and provides a 
penalty for the final time (we again take the liberty of a 
slight abuse of notation) . Instead of using the expected 
value of the quantity l|18|) as a cost function (risk-neutral 
case, see Section |VJ, we consider the average of the ex- 
ponential of (|18fl in the following way. Define R(t) to be 
the solution of the operator differential equation 



dR(t) _ n 
dt ~ 2 



Ci(t)ii(i) 



(19) 



with initial condition R(0) = I. Here, \i > is a positive 
(risk) parameter. The solution of 1(1 9|) can be expressed 
as the time-ordered exponential 



R(t) =exp ( | 



d(a)da 



(20) 



We then define the risk-sensitive cost function to be the 
quantum expectation 



J"(K) - {p Q ,R\T)e>* G2{T) R{T)). 



(21) 



Here, po = ttq <%> V\v\ <£> v 2 v 2 , as above. 

The cost function 1(21(1 is one possible quantum gen- 
eralization of the classical risk-sensitive criterion. The 
operator ordering was chosen to be compatible with the 
evolution of operators in the Heisenberg picture, thus fa- 
cilitating the stochastic dynamical representations given 
in Section lTTll which are needed for dynamic programming 
solution in Section Hvl 



III. 



STOCHASTIC REPRESENTATION OF 
COST AND FILTERING 



We wish to express the risk-sensitive cost l|21[) in terms 
of a (reduced) quantity defined on the system space that 
is driven by the data y 2 (t), < t < T, obtained from 
the continuous monitoring of the second field channel. 
To this end we first represent the cost J A '(K) as a clas- 
sical expectation with respect to a reference Wiener dis- 
tribution in terms of (reduced) quantities defined on the 
system space, analogous to the stochastic representations 
of quantum dynamical semigroups considered in Q , \2l\ , 
and then we apply classical filtering. Essentially, we mon- 
itor both channels and then average the results for the 
first channel. This procedure does not "demolish" system 
variables due to the commutativity property or quantum 
non- demolition (QND) condition: for all initial system 
operators X: 

[Q(t),X] = V < t < T; (22) 

see |M eq. (5.3.29)], Q, @ (equation is understood 
componentwise) . 

Before considering the cost representation, we discuss 
the statistics of the fields, [53, Chapters 5 and 11]. The 
input real quadrature operators Q(t),0 < t < T, are 
commutative, 

[Qj(t), Q k (s)] = V < s,t < T, j, k = 1, 2, (23) 

and when the fields are initialized in the vacuum states 
they correspond to a (two-dimensional) real Wiener pro- 
cess (Brownian motion) q(t) = (q\(t),q 2 (t)) via the Segal 
map 0, Chapter 5]. Indeed, let Ot denote the set of all 
Wiener paths (continuous functions of time on the inter- 
val [0,T]). The probability of a subset F C fir of paths 
is 



P°(F) - { Vl vl®v 2 vl,pQ(F)), 



(24) 
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where Prf(F) is the projection operator associated with 
F and Q(s), < s < T. The probability distribution P° 
is the Wiener distribution, under which the increments 
q(t) — q(s) 1 < s < t < T, are independent, Gaussian, 
with zero mean and covariance (t—s)I (here / is the 2x2 
identity matrix). The output fields Y(t) = jt(u, Q(t)) are 
also commutative 

[Y j {t),Y k (s)]=0V0<s,t<T,j,k=l,2, (25) 

and satisfy the QND condition (cf. (j22J) 

[Y(s), X(t)} =0VO<s<KT, (26) 

see H Eq. (2.24)], 0, Eq. (8)], jH Section 5.3]. The 
statistics of the continuously observed channel is dis- 
cussed below (see lj52*|0 . 

We continue now with the representation of the cost. 
Define an operator V(t), not unitary in general, by 

V(t) = U(t)R(t). (27) 

Then by the rules of quantum stochastic calculus |2(], 
Section 5.3], |H Chapter III], 0, Chapter 5] V(t) solves 
the QSDE 

dV{t) = {-K»(u(t))dt + LdB\{t) - L^dB^t) 

+MdBl(t)-MUB 2 (t)}V(t) (28) 

with initial condition V(0) = I, where 

K»(u) = K(u) - ulC^u). (29) 

If we define 

j?(u,X) = Vi(t)[X®I®I)V{t), (30) 

then we can write 

J»{K) = (p ,j£(u,e^)). (31) 

We now apply the tec hniq ue developed in 0, Sec- 
tion 2] and described in |2l], Chapter 5] for stochas- 
tic representation of quantum semigroups &t(X) = 
<£ V1 ® V2 [jt(u, X)]^^ to obtain a stochastic representation 
of the cost. The key idea is that, with vacuum initializa- 
tion, V(t) can be viewed as a function of the real quadra- 
tures of the fields, and by the Segal duality map (see 0, 
Section 5.2.1]), this means that quantum expectations 
are equivalent to classical expectations. The first point 
can be seen from the fact that dB\ (t)v\ = 0, which means 
that we can write 

(LdB{(t) - LUB^vx = LdQ\(t)v\, (32) 

and similarly for the second quadrature. The second 
point can be seen from H24J1 . which can be used to relate 
classical and quantum expectations. The result is that if 
we define a system operator V(t) acting on system state 
vectors ip to be the solution of the SDE 

dV(t) = {-K"(u(t))dt + Ldqi{t) + Mdq 2 (t)} V{t) 

(33) 



where we have used the components of the Wiener pro- 
cess q{t) = (qi(t) , q 2 (t)) which describes the statistics of 
the real quadrature as mentioned above, then for any 
system operator X 

(p , V\t)[X ®I®I\V{t)) = E°[(7r , V\t)XV{t))],{M) 

where E° denotes expectation with respect to the refer- 
ence probability distribution P° (recall l|24|l '). Therefore 
if we write 

j?(u,X)=Vl(t)XV(t), (35) 

we obtain 

/(K)=E B [(,„,jf( V ^))], (36) 

a stochastic representation of the risk-sensitive cost func- 
tion with respect to the reference Wiener distribution 
P° {in terms of semigroups &t(X) = ^i®w 2 \jt( u > X)] = 

E°[tf («,*)])• 

Consider the operators Y 2 (t), < t < T describing 

the real quadrature of the second output field channel 

(and containing information about the interaction with 

the system). These operators are also commutative, since 

from lf2li|l. 

[Y 2 (t),Y 2 (s)] =0V0<s,t<T. (37) 

Since we only measure the second field channel, we aver- 
age the first component by computing the classical con- 
ditional expectation 

j?(u,X)=E°\j?(u,X)\q 2 (s),0< s < t]. (38) 

This is straightforward due to the independence of the 
two fields; indeed, we note that jf(w, •) satisfies, with 
respect to the reference distribution P°, the SDE 

dj?(u,X) = f t l (u,-XK»(u(t)) - K^(u(t))X 
+L ] XL + M 1 XM)dt 
+j?(u,tfX + XL)dq 1 {t) 
+]?(u,M^X + XM)dq 2 (t) (39) 

with corresponding output equation 

dy 2 (t) = dq 2 {t) (40) 

(since the measurement values are = 92 (t))- Then, 
say from classical filtering theory, Chapter 18], |38l 
Chapter 7], we have 

dj?(u,X) = j?(u,-XK»(u(t))-Krt(u(t))X 
+VXL + M ] XM)dt 

M^X + XM)dy 2 {t). (41) 

By standard properties of classical conditional expecta- 
tions (see, e.g. 0, Section 34]), we have 

E [<7r ,#(u,*)>] 
= E°[E°[K,j;>,X))|y 2 ( S ),0< S <i]], 
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and hence 



Equivalcntly, 



J^(K)=E [(^j£( W ,e^))] 



(42) 



Here, the expectation is with respect to y 2 {£) (= q 2 (t)), a 
standard Wiener process (since qi(t) has been averaged 
out in the definition of jf° (it, X)). 

We define an unnormalized risk-sensitive state 
(which acts on system operators) by 



(43) 



Then erf is the solution of the SDE 



da? = (-K»(u(t))a?-a?Krt(u(t)) 
+La^L ] + Ma^M^dt 
+(Ma? + a?M*)dy 2 (t), 



(44) 



dof = --[H(u(t)),a?]dt + V[L]a?dt + V[M]a?dt 

+^H[C 1 (u(t))]a?dt + H[M]a?dy 2 (t), (45) 

where T>[c]p = cpc^ — ^ (c i cp + pc^ c) and H[c]p = cp + pc^ . 
Equation l|44|l (or (|45(l . or (|57|l below) is called the risk- 
sensitive filter. 

The representation l|42|) becomes 



J"(K) = E°[(4,e^ C2 



(46) 



This expression is similar to the classical forms 0, eq. 
(3.4)], [il eq. (2.10)], and will be used in Section ITVI 



Next we point out that the risk-sensitive state a? re- 
duces to the standard conditional state at when p = 0, 
and (|44|l or l|45|l reduce to the usual stochastic master 
equation or Belavkin quantum filtering equation (e.g. 
[H Chapter 5.2.5]); indeed 

da t = {-K(u(t))a t -a t K\u{t)) 
+La t P + Ma t M^)dt 
+(Ma t + a t M^)dy 2 {t). (47) 

The corresponding normalized conditional state is given 
by 



7Ti 



which satisfies the SDE 



at 



(48) 



dn = {-K(u(t))n-ntK\u(t)) (49) 
+L-K t L^ + Mn t M^)dt 

+ (Mnt + 7T t M t - 7r t tr[(M + M^)-K t ])dw(t), 

where w(t) is a standard Wiener process (innovation) un- 
der a distribution P (to be described shortly) related to 
V2(t) by 



dy 2 (t) = tr[(M + M^TTtjdt + dw(t). 



(50) 



dirt = --[H(u(t)),ir t ]dt + V[L\n t dt + V\M}iTtdt 

+H[M]-K t dw(t), (51) 

where Ti[c]p = cp + pc) — ptr(cp + pc'). 

The distribution P is defined on the set fl 2 ,T of all 
possible measurement paths y 2 (t), < t < T of the op- 
erators Y 2 (s), < s < T, For a subset F 2 C Sl 2 ,T, the 
associated projection operator is denoted P^ 2 (F), and 
the corresponding probability is given by 

P(F 2 ) = ( Po ,P Y2 (F2)) 

= (p(T),P$*(F 2 )), (52) 

where p(t) = U{t)p U^{t), Po = kq® v\v\ (£) v 2 v\. Note 
that the distribution P depends on the controller K in 
the feedback loop, so strictly P = P K . 

We conclude this section with some alternative expres- 
sions for the risk-sensitive cost together with the associ- 
ated variants of the risk-sensitive state. These can be 
derived using the stochastic calculus, e.g. 0], j3^. If 
we define a second unnormalized risk-sensitive state by 



(53) 



(note that the denominator is the normalization factor 
for the standard conditional state), we obtain the repre- 
sentation 



J"(K)=E[<s£,e 



MC 2 



(54) 



with respect to the output distribution P (since dP ~ 
(a T , l)dP°, [H Chapter 5], H Section 32], [Hi Chapters 
6 and 7]). A third representation can be obtained using 
the following normalized risk-sensitive state 



way 



(55) 



namely 



J"(K) = E^[cxp(^ f tr{C 1 (u(t))ir?)dt)(n^e t * C2 }} 
Jo 

(56) 

where E M denotes expectation with respect to the prob- 
ability distribution P M defined by dP^ = At^dP , where 



exp(-± f Itrpf + MVtll 2 * 
Jo 



tr[{M + M^]dy 2 (t)). 



The SDE satisfied by 7if is 



dn? = --[H(u(t)),ir?]dt + V[L]Tr?dt + T>[M]n?dt 

+^H[C 1 {u{t))]^dt + H[M]^dw li {t), (57) 
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where ui M (i) is a standard Wiener process with respect 
to P M defined by 



dy 2 (t) = tr[(M + M f )^]dt + dw»(t). 



(58) 



In the next section we use a? and the representation 
()46f) to show how dynamic programming methods from 
optimal control theory can be applied to the risk-sensitive 
problem. 

The state Tr t (or equivalently at) is the familiar a pos- 
teriori state of the system conditioned on the measure- 
ment data, whose function is to facilitate the description 
of behavior via calculation of conditional expected val- 
ues of observable quantities, and thus is a representation 
of measurement knowledge. See [3^, Section IV] for an 
interesting discussion of the reality of quantum trajecto- 
ries and conditional states. The risk-sensitive state it? 
(or equivalently erf, 7ff ) is also determined by the mea- 
surement data, but through dynamics that contains the 
cost term C± — it is by this mechanism that knowledge is 
tempered by purpose, and is a reflection of the prescrip- 
tive nature of control theory. The risk-sensitive state is 
properly understood in the context of the risk-sensitive 
feedback control problem, since it is a suitable state in 
terms of which the risk-sensitive problem can be solved. 
This is a different use of measurement information than 
is standard in quantum mechanics (see |lOl| for a short 
comparison of descriptive and prescriptive sciences). All 
of these states are examples of what are called informa- 
tion states in classical control theory, [2f| , [24| • 

We remark that the quantum formulation of the risk- 
sensitive problem given here corresponds in discrete time 
to [23l case (i) of Example 6]. The state 7ff was used 
(there denoted u)fc), and the modified stochastic master 
equation in discrete time is [2^, eq. (39)]. 

We close this section by noting that if the field Y 2 (i) is 
measured with efficiency < r/ < 1, that is, if we measure 



Z(t) = ^Y 2 ( t ) + y/T=T, Y 3 (t) 



(59) 



instead of Yzif), where Y^(t) is the real quadrature of 
a third (and independent) field, then the risk-sensitive 
filter equation (|44|l becomes 

da? = (-K^u(t))a?-a?K^(u(t)) 
+La?tf + Ma?M^)dt 
+y/¥j(Ma? + a?M ] )dz{t). (60) 

IV. DYNAMIC PROGRAMMING 

In this section we show how the dynamic programming 
method can be used to determine the optimal controller. 
We make use of the representation l|46[) and the fact that 
the state a? evolves in time according to the dynamics 
(|44J) driven by the measurement data y(-). 

The method of dynamic programming works by defin- 
ing, for each time t and state a, the optimal value of the 



cost from time t to the final time T. This optimal value 
is called the value function. By considering the equa- 
tion that the value function satisfies, called the dynamic 
programming equation or Hamilton- J acobi- Bellman equa- 
tion, the optimal control that should be used at time t 
when in state a can be determined. The dynamic pro- 
gramming equation for the value function is solved back- 
wards, from the final time t — T to the initial time t = 0. 
The equation is an infinitesimal statement of the princi- 
ple of optimality, 0, Chapter VI] , [2^, Chapter 6] . 

Define the risk-sensitive value function (a, t) for an 
arbitrary initial unnormalized state a and initial time 
< t < T by 



S"(tr,t) =infE" 

K ' 



(a£,e» c * 



(61) 



where a£ denotes the solution of (|44|l at time T with 
initial condition a? = a (we have made explicit the de- 
pendence on the initial state and time in the expectation 
notation). Note that the cost i|21|) is given by 

J^(K)=E° o>0 [K, e ^)] (62) 

so that the optimal controller K/*'* is determined by 



J"(K"'*) = S"(7r ,0). 



(63) 



The method of dynamic programming in this context 
relates the value function at time t and at a later time 
t < s < T along optimal trajectories via the relation 



S^a,t)=mfB° cr!t [S^,s)}. 



(64) 



This is the principle of optimality. Note that by definition 
the terminal value is S**((r,T) = (a, e^ 2 ) . The dynamic 
programming relation may be considered in differential 
form (subject to mathematical technicalities) resulting in 
the dynamic programming PDE (sec l|66|l below) . To see 
this, let h > 0, set s = t+h in (|64|) . re-arrange and divide 
by h: 



= MV° [ S ^ t + h) - S ^ t \ 
k h 

Sending h I yields 

t) + inf C^S^ia, t) = 0, < t < T, 



ueu 



S"(a,T) = (a,e» C2 ). 



(65) 



(66) 



Note that in the dynamic programming PDE (|66(l . the 
minimization is over the control values u, whereas in the 
definitions of the cost l|21|) and value function (|61() the 
minimizations are over the controllers K. 

We now explain the meaning of the operator C^' u ap- 
pearing in the dynamic programming PDE 166f) . follow- 
ing classical stochastic control methods For a fixed 
constant control value u (i.e. u{t) = u G U for all t), erf is 
a Markov process with generator which is defined, 

when it exists, by 



L^ u f(a) = lim 



E%[f(a?)}-f(a) 



(67) 
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for suitably smooth functions /(•). In fact, £ M ' u /(o) can 
be calculated explicitly for / of the form 



f(a)=g((a,X 1 ),...,(a,Xj)), 



(68) 



where g is a smooth bounded function of vectors of length 
J. and X\ , . . . , Xj are system operators. Indeed, for fixed 
u € U and for such /, we have 



i J 

- J2 9 j k((cr,X 1 ),...,(a,Xj)). 

.{a, M^Xj + XjM) (a, M 1 X k + X k M) 



(69) 



j 

+ Y J 9j((^X 1 ),...,(a,X J )). 

3 = 1 

.{a, -K^(u)Xj - X 3 K^{u) + iJXjL + M f XjM) 



where gj and gj k denote first and second order partial 
derivatives of g. 

If the dynamic programming PDE has a sufficiently 
smooth solution S 7 ' (a, t) , then the optimal controller 
K M '* can be obtained as follows. Let u'*'*(cr, t) denote 
the control value that attains the minimum in 166|) for 
each o, t. The optimal controller is obtained by combin- 
ing this function with the risk-sensitive filter Ij44(l : 



do? - (-i^KOK - °?Krt{u{t)) 
KM * +La\ l tf + Ma£Mi)dt 

+(Ma? + a?Mi)dy 2 (t) 
u{t) =u"'*(o?,i). 



(70) 



This controller is a dynamical controller, of the general 
form 1|17|) . The structure of this controller is illustrated 
in Figure ^ where it is shown in closed loop with the 
quantum system being controlled. The controller K M '* 
can be implemented in classical electronics (e.g. analog 
circuit or digital computer). The filter is the implemen- 
tation of the dynamics l|44|) for the risk-sensitive state 
of , which as described at the end of Section Mil repre- 
sents the controller's knowledge of the physical system 
tempered by its purpose. The feedback control function 
u M, *(o, t) is determined by solving the dynamic program- 
ming equation backwards; this computation can be done 
offline, with the results stored and available for online 
use. The risk-sensitive filter is, of course, solved online 
while the quantum system is being controlled. 

Note that we may, of course, alternatively carry out dy- 
namic programming and express the controller in terms 
of the states 7r? , ff?; in fact, appropriate normalization 
is important for practical reasons. 



RISK-NEUTRAL OPTIMAL CONTROL 



In this section wc briefly discuss a risk-neutral problem 
of the type that has been studied by Q, 0, G3j • 



input 



physical system 

QLE 

eqn. © 



output 



y 



control 
u^*(o?,i) 



filter 
state of 
eqn. l|4Tjl 



feedback controller K^' 



FIG. 1: Optimal risk-sensitive controller K^'* in closed loop 
with the physical system being controlled. 



Specifically, we consider the risk-neutral problem defined 
by the quantum expectation 

J(K) = (p , [ T d{t)dt + C 2 (T)), (71) 



where as before, po = ttq ® v\v\ <E> v 2 v\. The key step 
in solving the optimal control problem specified by (|7L|) 
is again a stochastic representation followed by classical 
conditional expectation, as in Section Mil which results 
in 

J(K) = E[/ (7rt > C 1 («(t))>dt+(7rr,C r a )] 
Jo 

3 [/ {a u C 1 {u{t)))dt + {a T ,C 2 )\ (72) 



where ir t and at are the conditional states, assuming in- 
terchanges of expectations and integrals are justified. 
The risk-neutral value function can now be defined by 



W(a,t) = infE° t [/ (tr„C , i)d8 + (tr r ,C7 a )] (73) 
K Jt 

and the corresponding dynamic programming equation 
reads 

§tW{a, t) + M{C u W(a, t) + C x (u)} = 0, < t < T, 



tieu 



W(a,T) = (a,C 2 ). 



(74) 

where, for fixed control value u £ U, C u is the generator 
of the Markov process at, and is given by 

C u f(a) = (75) 

1 3 

- J2 9jki.{^X 1 ),...,{a,Xj)). 

3,k=\ 

.{a, M^Xj + XjM) (a, M+ X k + X k M) 
j 

+ £< 7i «o,X 1 ),...,<o,X J ». 
j'=i 

.{a, -K(u)Xj - XjK(u) + L^XjL + M^XjM) 
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for functions / of the form (|68|l . 

If the dynamic programming equation (|74|> has a suf- 
ficiently smooth solution W(cr, t), then the optimal con- 
troller K* is given by 



sensitive state by writing 



K* 



da t = (-K(u(t))<j t - o-tift («(*)) 
+L<j t tf + Ma t M^)dt 
+(Ma t + a t M^dy 2 (t) 

u(t) =u*W,t). 



(76) 



where u* (<x, t) attains the minimum in (|74|l . The dynam- 
ical part of this controller is the Belavkin quantum filter, 

63- 



VI. FEEDBACK CONTROL OF A TWO-LEVEL 
ATOM 

In this section we consider the application of the risk- 
sensitive control problem to the example studied in 
namely the feedback control of a two-level atom using a 
laser. 

The amplitude and phase of the input laser can be 
adjusted, so via the interaction with the laser the atom 
can be controlled. The real quadrature of a second field 
channel is continuously monitored, say by homodyne de- 
tection, providing an indirect measurement of the atom. 
The control input is complex, u = u r +iui = |it|e larg '" € C 
(the control field channel becomes a coherent state cor- 
responding to u). The measurement signal 2/2 (i) is real. 
It is desired to regulate the system in the a z up state 
I t > = (1,0) T (the down state is | J. ) = (0, 1) T , and 

l\ / -i\ , ^10 

1 ' a * = (i and<r * = -1 



the Pauli matrices). 

In terms of the notation used in this paper, we have 



L = 



U 



K/er_, AI = K s er_, H(u) = i(u* L — uL'), 
1, cr_ 





2 _,_ 2 





1 



C, d(u)=a 







1 



¥W 2 ( 1 1 



C 2 = c ( q x ) , a > 0, b > 0, c > 0. 



Here, Kj and k 2 are the decay rates into the control and 
measurement channels. The parameters a, b and c are 
weights for the components of the cost. Note that ( [ 
|Ci(tt)| I ) > and ( ! |C 2 | | ) > (if a > and c > 0), 
while ( T |Ci(0)| t ) = and ( T \C 2 \ T ) = 0, reflecting 
the control objective. 

We use the framework described in previous sections 
to solve the optimal risk-sensitive control problem. Since 
the second (it-dependent) part of C\(u) is proportional 
to the identity, that part commutes with all operators 
and it is convenient to factor its contribution to the risk- 



\ (n(t)I + x{t)a x + y(t)(Ty + z{t)a z ) 



.exp^i//6 J \u(s)\ 2 ds 

1 n(t) + z(t) x(t)-iy(t) 
2 1 x(t)+iy(t) n(t) - z(t) 

.exp^iyu6 J \u(s)\ 2 ds 



Then substitution into the SDE (|44|) shows that the co- 
efficients satisfy the SDEs 

dn(t) = \iia(n(t) - z{t))dt + K s x{t)dy 2 {t) (77) 
dx(t) = iia)x{t)dt + 2KfU r {t)z{t)dt 

+K s (n{t) + z(t))dy 2 (t) 
dy(t) = tia)y(t)dt-2K f Ui(t)z(t)dt 
dz(t) = -(1 - \iia)z(t)dt - (1 + \na)n{t)dt 

—2Kf(u r (t)x(t) — Uj(t)y(t))dt — K s x(t)dy2(t). 

The representation l)46|) reads 

J"(K) =E°[cxp ^ J\\u{s)\ 2 ds^j i(n(T)-z(T))e^ c ]. 

(78) 

We consider the value function 161() as a function of the 
coefficients, i.e. S^in, x, y, z, t). In terms of these param- 
eters, the dynamic programming equation is 

&S"(n, x, y, z, t) + jnf {C^S^n, x, y, z, t) 

+\lib\u\ 2 S»{n, x, y, z,t)} = 0, 0<t< T, (79) 
S»(n,x,y,z,T) = \{n- z)e» c , 

where the operator is given, for sufficiently smooth 
functions f(n,z,y,z), by 

C) 1 f — 2^ S X fnn ~t~ 2^^(^ ^0 fxx ~t~ 2^s^ fzz 

-\-K 2 x(n + z)f nx — n 2 x 2 f nz — K 2 (n + z)xf X z 
+.f n {\^a(n - z)) + / x (-i(l - fia)x + 2n f u r z) 

- ^ a )V - ^KfUtZ) 

- 5Ma)z - (1 + ^ a )«- 
—2nf(u r x — Uiy). 

Here, the subscripts f nx , etc, refer to partial derivatives, 
and the arguments n, x, y, z have been omitted. 

To construct the optimal risk-sensitive controller K M '*, 
we suppose that (|79J) has a smooth solution, which we 
write as 

^(n.rcjj/^i) = nexp (-W"(n, x, y, z, £)J . (80) 

The minimum over it in l|79[) can be explicitly evaluated 
by setting the derivatives of the expression in the paren- 
theses (it is concave) with respect to u r and Ui to zero. 
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The result is 

2k f 

ul?'*(n,x,y,z,t) = —±(xW£(n,x,y,z,i) 
on 

-zW£{n,x,y,z,t)) 
2k f 

u?'*(n,x,y,z,t) = -^-(zW*f{n,x,y,z,t) 

-yW?(n,x,y,z,t)). (81) 

The optimal risk-sensitive controller is then 

K"'* : u(t) = u?>*(n(t),x(t),y(t),z(t),t) 

+iuf'*(n(t),x(t),y(t),z(t),t), (82) 

where n(t), x(t), y(t) and z(t) are given by (|77|) . 

Note that the dynamic programming equation l|79|) 
(which is a partial differential equation of parabolic type) 
is solved backwards in time, using the terminal condition 
specified: S^n, x, y, z, T) = \{n — z)e^ c . The infimum in 
(|79|l can be removed by substituting in the optimal con- 
trol values given by the explicit formulas (|81(l . if desired. 
However, the form Ij79(l is better suited to numerical com- 
putation, since the optimal control structure is preserved, 
[26j . Note that in this example, the risk-sensitive filter 
(|44(l is replaced by the finite-dimensional SDE JUJ; this 
fact is important for practical computational reasons. 

Finally, we consider the risk-neutral problem. Write 

a t = | (n(t)I + x(t)a x + y(t)a y + z(i)a z ) . (83) 

Then from the SDE gTJ, we find that 

dn{t) = K s x(t)dy 2 (t) (84) 
dx(t) = -\x(t)dt + 2KfU r (t)z(t)dt 

+K s (n{t) + z(t))dy 2 (t) 
dy(t) = -\y{t)dt- 2KfUi(t)z(t)dt 
dzit) = -z(t)dt-n(t)dt 

—2Kf(u r (t)x(t) — Ui(t)y(t))dt — K s x(t)dy2(t). 

The risk-neutral representation l|72|l becomes 

J(K) = E°[i f (a(n(t) - z(t) + b\u{t)\ 2 )dt 
Jo 

+ i(n(T)-z(T))c], (85) 
and the dynamic programming equation is 

§- t W{n, x, y, z, t) + inf {C u W(n, x, y, z, t) 

+\{a{n-z) + b\uf)} = 0, < t < T, (86) 
W(n, x, y, z,T) = \(n- z)e c , 

where 

f = ^ K S X" f nn + \k s (tl + z) f xx + ^K g X f zz 

+K 2 s x(n + z)f nx - K 2 s x 2 f nz - K 2 s {n + z)xf xz 

+fx{—\x + 2KfU r z) 

+fy{-\v - 2K f Uiz) 

+f z (—z - n - 2K f (u r x - Uiy) 



Evaluating the minimum in l|86|l gives 

u*(n,x,y,z,t) = ^-(xW z (n,x,y,z,t) 
-zW x (n,x,y,z,t)) 

u*(n,x,y,z,t) = —^(zW y (n,x,y,z,t) 

-yW z (n,x,y,z,t)), (87) 

cf. 0, eq. (15)]. The optimal risk-neutral controller is 

IC : u(t) = u*(n(t),x(t),y(t),z(t),t) 

+iu*(n(t),x(t),y(t),z(t),t), (88) 

where n(t), x(t), y(t) and z{t) are given by (|8"4"| . Note 
that normalization of l|84(l results in 0, eq. (7)]. 

Note that the expressions for the both the risk-sensitive 
and risk-neutral controllers are similar, and involve a sim- 
ilar level of complexity for implementation. When a = 0, 
the risk-sensitive SDEs (|77J) reduces to the risk-neutral 
or standard SDEs (|84|) . though the controllers will be 
different in general. 

VII. CONCLUSION 

In this paper we have studied a risk-sensitive optimal 
control problem for open quantum systems. The model 
we used for continuously monitored open quantum sys- 
tems is given by a quantum Langevin equation. Using 
quantum stochastic calculus and dynamic programming, 
we showed how to formulate and solve the risk-sensitive 
optimal control problem. The optimal controller we ob- 
tained is shown in Figure ^ It has two components. 
One component is dynamic, a filter that computes the 
risk-sensitive state. The second component is an optimal 
control feedback function that is found by solving the 
dynamic programming equation. The optimal controller 
can be implemented using classical electronics. This pro- 
cedure is computationally intensive, due to the storage 
requirements of the feedback control function, and the 
speed demands of online solution of the risk-sensitive fil- 
tering equations. 

A significant feature of the optimal control solution is 
the use of the risk-sensitive state. This is different in gen- 
eral to the state usually used in quantum physics. The 
difference is because the filter that computes it contains 
terms corresponding to the cost function specifying the 
control objective. Such cost terms do not appear in the 
conventional quantum trajectory equations or Belavkin 
quantum filtering equations. One could say that the 
risk-sensitive state describes knowledge of the physical 
system being controlled, but tempered by the control 
purpose, and thereby represents measurement informa- 
tion in a way that is suitable for this feedback control 
problem. Consideration of this issue may be of interest. 

One of the motivations for considering risk-sensitive 
optimal control is the enhanced robustness properties rel- 
ative to risk-neutral (e.g. LQG) control. Robustness of 
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a control system concerns its ability to cope with per- 
formance degrading influences of uncertainty and noise. 
In the case of quantum systems, decoherence is a key 
limiting factor in the development of quantum technolo- 
gies, and it is therefore important to design controllers 
which are also robust with respect to decoherence. Con- 
sequently, evaluation of the robustness properties of risk- 
sensitive control for open quantum systems is an impor- 



tant topic of investigation. 

We also mention that for quantum systems with 
quadratic Hamiltonians and Gaussian initial states, the 
risk-sensitive state is also Gaussian. This fact has im- 
portant practical implications which are beginning to be 
investigated, |34j . 
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